491 research outputs found

    Incremental refinement of image salient-point detection

    Get PDF
    Low-level image analysis systems typically detect "points of interest", i.e., areas of natural images that contain corners or edges. Most of the robust and computationally efficient detectors proposed for this task use the autocorrelation matrix of the localized image derivatives. Although the performance of such detectors and their suitability for particular applications has been studied in relevant literature, their behavior under limited input source (image) precision or limited computational or energy resources is largely unknown. All existing frameworks assume that the input image is readily available for processing and that sufficient computational and energy resources exist for the completion of the result. Nevertheless, recent advances in incremental image sensors or compressed sensing, as well as the demand for low-complexity scene analysis in sensor networks now challenge these assumptions. In this paper, we investigate an approach to compute salient points of images incrementally, i.e., the salient point detector can operate with a coarsely quantized input image representation and successively refine the result (the derived salient points) as the image precision is successively refined by the sensor. This has the advantage that the image sensing and the salient point detection can be terminated at any input image precision (e.g., bound set by the sensory equipment or by computation, or by the salient point accuracy required by the application) and the obtained salient points under this precision are readily available. We focus on the popular detector proposed by Harris and Stephens and demonstrate how such an approach can operate when the image samples are refined in a bitwise manner, i.e., the image bitplanes are received one-by-one from the image sensor. We estimate the required energy for image sensing as well as the computation required for the salient point detection based on stochastic source modeling. The computation and energy required by the proposed incremental refinement approach is compared against the conventional salient-point detector realization that operates directly on each source precision and cannot refine the result. Our experiments demonstrate the feasibility of incremental approaches for salient point detection in various classes of natural images. In addition, a first comparison between the results obtained by the intermediate detectors is presented and a novel application for adaptive low-energy image sensing based on points of saliency is presented

    Core Failure Mitigation in Integer Sum-of-Product Computations on Cloud Computing Systems

    Get PDF
    The decreasing mean-time-to-failure estimates in cloud computing systems indicate that multimedia applications running on such environments should be able to mitigate an increasing number of core failures at runtime. We propose a new roll-forward failure-mitigation approach for integer sumof-product computations, with emphasis on generic matrix multiplication (GEMM)and convolution/crosscorrelation (CONV) routines. Our approach is based on the production of redundant results within the numerical representation of the outputs via the use of numerical packing.This differs fromall existing roll-forward solutions that require a separate set of checksum (or duplicate) results. Our proposal imposes 37.5% reduction in the maximum output bitwidth supported in comparison to integer sum-ofproduct realizations performed on 32-bit integer representations which is comparable to the bitwidth requirement of checksummethods for multiple core failure mitigation. Experiments with state-of-the-art GEMM and CONV routines running on a c4.8xlarge compute-optimized instance of amazon web services elastic compute cloud (AWS EC2) demonstrate that the proposed approach is able to mitigate up to one quadcore failure while achieving processing throughput that is: 1) comparable to that of the conventional, failure-intolerant, integer GEMM and CONV routines, 2) substantially superior to that of the equivalent roll-forward failure-mitigation method based on checksum streams. Furthermore, when used within an image retrieval framework deployed over a cluster of AWS EC2 spot (i.e., low-cost albeit terminatable) instances, our proposal leads to: 1) 16%-23% cost reduction against the equivalent checksum-based method and 2) more than 70% cost reduction against conventional failure-intolerant processing on AWS EC2 on-demand (i.e., highercost albeit guaranteed) instances

    Mitigating Silent Data Corruptions In Integer Matrix Products: Toward Reliable Multimedia Computing On Unreliable Hardware

    Get PDF
    The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, machine learning and object recognition systems that process integer inputs. Therefore, it is of paramount importance to ensure that integer GEMM computations remain robust to silent data corruptions (SDCs), which stem from accidental voltage or frequency overscaling, or other hardware non-idealities. In this paper, we introduce a new method for SDC mitigation based on the concept of numerical packing. The key difference between our approach and all existing methods is the production of redundant results within the numerical representation of the outputs, rather than as a separate set of checksums. Importantly, unlike well-known algorithm-based fault tolerance (ABFT) approaches for GEMM, the proposed approach can reliably detect the locations of the vast majority of all possible SDCs in the results of GEMM computations. An experimental investigation of voltage-scaled integer GEMM computations for visual descriptor matching within state-of-the art image and video retrieval algorithms running on an Intel i7- 4578U 3GHz processor shows that SDC mitigation based on numerical packing leads to comparable or lower execution and energy-consumption overhead in comparison to all other alternatives

    Generalized Numerical Entanglement For Reliable Linear, Sesquilinear And Bijective Operations On Integer Data Streams

    Get PDF
    We propose a new technique for the mitigation of fail-stop failures and/or silent data corruptions (SDCs) within linear, sesquilinear or bijective (LSB) operations on M integer data streams (M ⩾ 3). In the proposed approach, the M input streams are linearly superimposed to form M numerically entangled integer data streams that are stored in-place of the original inputs, i.e., no additional (aka. “checksum”) streams are used. An arbitrary number of LSB operations can then be performed in M processing cores using these entangled data streams. The output results can be extracted from any (M-K) entangled output streams by additions and arithmetic shifts, thereby mitigating K fail-stop failures (K ≤ ⌊(M-1)/2 ⌋ ), or detecting up to K SDCs per M-tuple of outputs at corresponding in-stream locations. Therefore, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. Our proposal is validated within an Amazon EC2 instance (Haswell architecture with AVX2 support) via integer matrix product operations. Our analysis and experiments for failstop failure mitigation and SDC detection reveal that the proposed approach incurs 0.75% to 37.23% reduction in processing throughput in comparison to the equivalent errorintolerant processing. This overhead is found to be up to two orders of magnitude smaller than that of the equivalent checksum-based method, with increased gains offered as the complexity of the performed LSB operations is increasing. Therefore, our proposal can be used in distributed systems, unreliable multicore clusters and safety-critical applications, where robustness against failures and SDCs is a necessity

    Cloud Instance Management and Resource Prediction For Computation-as-a-Service Platforms

    Get PDF
    Computation-as-a-Service (CaaS) offerings have gained traction in the last few years due to their effectiveness in balancing between the scalability of Software-as-a-Service and the customisation possibilities of Infrastructure-as-a-Service platforms. To function effectively, a CaaS platform must have three key properties: (i) reactive assignment of individual processing tasks to available cloud instances (compute units) according to availability and predetermined time-to-completion (TTC) constraints; (ii) accurate resource prediction; (iii) efficient control of the number of cloud instances servicing workloads, in order to optimize between completing workloads in a timely fashion and reducing resource utilization costs. In this paper, we propose three approaches that satisfy these properties (respectively): (i) a service rate allocation mechanism based on proportional fairness and TTC constraints; (ii) Kalman-filter estimates for resource prediction; and (iii) the use of additive increase multiplicative decrease (AIMD) algorithms (famous for being the resource management in the transport control protocol) for the control of the number of compute units servicing workloads. The integration of our three proposals into a single CaaS platform is shown to provide for more than 27% reduction in Amazon EC2 spot instance cost against methods based on reactive resource prediction and 38% to 60% reduction of the billing cost against the current state-of-the-art in CaaS platforms (Amazon Lambda and Autoscale)

    Toward Generalized Psychovisual Preprocessing For Video Encoding

    Get PDF
    Deep perceptual preprocessing has recently emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices. In this article, we lay the foundation for a generalized psychovisual preprocessing framework for video encoding and describe one of its promising instantiations that is practically deployable for video-on-demand, live, gaming, and user-generated content (UGC). Results using state-of-the-art advanced video coding (AVC), high efficiency video coding (HEVC), and versatile video coding (VVC) encoders show that average bitrate [Bjontegaard delta-rate (BD-rate)] gains of 11%-17% are obtained over three state-of-the-art reference-based quality metrics [Netflix video multi-method assessment fusion (VMAF), structural similarity index (SSIM), and Apple advanced video quality tool (AVQT)], as well as the recently proposed nonreference International Telecommunication Union-Telecommunication?(ITU-T) P.1204 metric. The proposed framework on CPU is shown to be twice faster than Ă— 264 medium-preset encoding. On GPU hardware, our approach achieves 714 frames/sec for 1080p video (below 2 ms/frame), thereby enabling its use in very-low-latency live video or game streaming applications

    Energy Harvesting for the Internet-of-Things: Measurements and Probability Models

    Get PDF
    The success of future Internet-of-Things (IoT) based application deployments depends on the ability of wireless sensor platforms to sustain uninterrupted operation based on environmental energy harvesting. In this paper, we deploy a multitransducer platform for photovoltaic and piezoelectric energy harvesting and collect raw data about the harvested power in commonly-encountered outdoor and indoor scenarios. We couple the generated power profiles with probability mixture models and make our data and processing code freely available to the research community for wireless sensors and IoT-oriented applications. Our aim is to provide data-driven probability models that characterize the energy production process, which will substantially facilitate the coupling of energy harvesting statistics with energy consumption models for processing and transceiver designs within upcoming IoT deployments

    Escaping the complexity-bitrate-quality barriers of video encoders via deep perceptual optimization

    Get PDF
    We extend the concept of learnable video precoding (rate-aware neural-network processing prior to encoding) to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based prediction) and multiple loss functions representing rate, distortion and visual quality of the virtual encoder. We evaluate our proposal with AVC/H.264 and AV1 under per-clip rate-quality optimization. The results show that DPO offers, on average, 14.2% bitrate reduction over AVC/H.264 and 12.5% bitrate reduction over AV1. Our framework is shown to improve both distortion- and perception-oriented metrics in a consistent manner, exhibiting only 3% outliers, which correspond to content with peculiar characteristics. Thus, DPO is shown to offer complexity-bitrate-quality tradeoffs that go beyond what conventional video encoders can offe

    Bandit framework for systematic learning in wireless video-based face recognition

    Get PDF
    Video-based object or face recognition services on mobile devices have recently garnered significant attention, given that video cameras are now ubiquitous in all mobile communication devices. In one of the most typical scenarios for such services, each mobile device captures and transmits video frames over wireless to a remote computing cluster (a.k.a. “cloud” computing infrastructure) that performs the heavy-duty video feature extraction and recognition tasks for a large number of mobile devices. A major challenge of such scenarios stems from the highly-varying contention levels in the wireless transmission, as well as the variation in the task-scheduling congestion in the cloud. In order for each device to adapt the transmission, feature extraction and search parameters and maximize its object or face recognition rate under such contention and congestion variability, we propose a systematic learning framework based on multi-user multi-armed bandits. The performance loss under two instantiations of the proposed framework is characterized by the derivation of upper bounds for the achievable shortterm and long-term loss in the expected recognition rate per face recognition attempt against the “oracle” solution that assumes a-priori knowledge of the system performance under every possible setting. Unlike well-known reinforcement learning techniques that exhibit very slow convergence when operating in highly-dynamic environments, the proposed bandit-based systematic learning quickly approaches the optimal transmission and cloud resource allocation policies based on feedback on the experienced dynamics (contention and congestion levels). To validate our approach, time-constrained simulation results are presented via: (i) contention-based H.264/AVC video streaming over IEEE 802.11 WLANs and (ii) principal-component based face recognition algorithms running under varying congestion levels of a cloud-computing infrastructure. Against state- of-theart reinforcement learning methods, our framework is shown to provide 17:8% 44:5% reduction of the number of video frames that must be processed by the cloud for recognition and 11:5% 36:5% reduction in the video traffic over the WLAN
    • …
    corecore